CLAIMS 



1. A method for clustering queries, the method comprising: 
identifying a same document and^or a plurality of similar documents 

selected by a user in response to a plurality of queries; and 

responsive to identifying the same document and/or the similar documents, 
generating a query cluster to indicate that the queries are similar independent of 
whether individual ones of the queries comprise similar composition with respect 
to other ones of the queries. 

2. A method as recited in claim 1, wherein the queries comprise a well 
formed natural language question, a keyword, or a phrase. 

3. A method as recited in claim 1, wherein the query cluster is used to 
disambiguate a word or phrase in a query of the queries. 

4. A method as recited in claim 1, further comprising determining that 
the queries are similar based on similar keyword or phrase composition. 



Iee@hayes pit 509.324-9256 



38 



MSl-mUSPAT.APP 



5. A method as recited in claim 1, wherein identifying the same 
document and/or the similar documents further comprises: 

determining the similar documents by evaluating a set of selected similar 
documents chosen responsive to queries p and q of the queries, wherein 
documents D_C(.) is a subset of a result listi)f:j according to the following: 
D_C(p) = {dpj, dp2 , ... , dpi} ^D(p) 
D_C(q) = {d,i, d,2 , ... , d^ } cD(q)\ 

wherein similarity based on selection of documents is based on: 

\iD_C(p) nD_C(q) ={dpqi , dpq2 , ... , dp^^ } p^0,then documents dp^j , 
dpq2 , ... , dpqk represent a set of common topics of queries p and q, and, 

whereby the similar documents between queries p and q is determined by 
D_C(p) nD_C(q). 

6. A method as recited in claim 1, further comprising constructing a 
thesaurus comprising a plurality of synsets, wherein each synset comprises one or 
more query clusters. 

7. A method as recited in claim 1, wherein identifying the same 
document and/or the similar documents further comprises determining the similar 
documents based on a proportionality of commonly selected individual 
documents. 
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8. A method as recited in claim 7, wherein identifying the same 
document and/or the similar documents further comprises: 

determining the similar documents based on a proportionality of commonly 
selected individual documents, such that: 

similarity ^Jp, q) - ^^(P^ ^ 



Max(rd(p), rd(q)) ' 

wherein rd{,) is the number of clicked documents for a query of the queries, 
and wherein RD{p, q) is the number of document selections in common. 



9. A method as recited in claim 1, wherein identifying the same 
document and/or the similar documents further comprises: 

determining the similar documents based on a hierarchical positioning 
between individual ones of a plurality of documents commonly selected across the 
queries. 
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10. A method as recited in claim 9: 

wherein F(di, dj) is a lowest common parent node for documents di and dj; 
wherein L(x) is a level of a node x; 

wherein LJTotal identifies a total number of levels in a hierarchy; and 

wherein a similarity between two documents is defined as follows: 

. 1 L(F(d,d^))~\ 
s{d^, d ) = — such that 

LJTotal - 1 

s(di, di)^l; and s(di, dj) = Qif F(du dj) = root; and 
the method further comprises: 

incorporating sfdf, dj) into a calculation of query similarity, wherein. 
di (l<i<m) and dj (l<j<n) be a set of selected documents for queries p and q 
respectively such that: 



r 

1 



m fi n m 

max s(ci,,dj)) J]( max s(d^,dj)) 

i=l I 7=1 



rd(p) rd(q) 



V 



11. Computer-readable media comprising computer-executable 
instructions for identifying similar queries, the computer-executable instructions 
comprising instructions for: 

identifying a same document and/or a plurality of similar documents 
selected by a user in response to a plurality of queries; and 

responsive to identifying the same document and/or the similar documents, 
generating a query cluster to indicate that the queries are similar independent of 
whether individual ones of the queries comprise similar composition with respect 
to other ones of the queries. 
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12. Computer-readable media as recited in claim 1 1, wherein the queries 
comprise a well formed natural language question, a keyword, or a phrase. 

13. Computer-readable media as recited in claim 11, wherein the query 
cluster is used to disambiguate a word or phrase in a query of the queries. 

14. Computer-readable media as recited in claim 11, wherein the 
computer-executable instructions further comprise instructions for determining 
that the queries are similar based on similar keyword or phrase composition. 

15. Computer-readable media as recited in claim 11, wherein the 
instructions for identifying the same document and/or the similar documents 
further comprise instructions for: 

determining the similar documents by evaluating a set of selected similar 
documents chosen responsive to queries p and q of the queries, wherein 
documents D_C(.) is a subset of a result list D(.) according to the following: 
D_C(p) = {dpj, dp2 , ... , dpi } cD(p) 
D_C(q) = {dqi, dq2 , ... , dqjj ^D(q); 

wherein similarity based on selection of documents is based on: 

lfD_C(p) n D_C(q) = { dpgj ,dpq2, ... , dpqk } ^0,±en documents dp^j , 
dpq2 , ... , dpqk represent a set of common topics of queries p and q, and, 

whereby the similar documents between queries p and q is determined by 
Dj:(p)nD_C(q). 
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16. Computer-readable media as recited in claim 11, wherein the 
computer-executable instructions further comprise instructions for constructing a 
thesaurus comprising a plurality of synsets, wherein each synset comprises one or 
more query clusters. 



17. Computer-readable media as recited in claim 11, wherein the 
instructions for identifying the same document and/or the similar documents 
further comprise instructions for determining the similar documents based on a 
proportionality of commonly selected individual documents. 



18. Computer-readable media as recited in claim 17, wherein the 
instructions for identifying the same document and/or the similar documents 
further comprise instructions for: 

determining the similar documents based on a proportionality of commonly 
selected individual documents, such that: 

similarity (p^ q) = f . . . 

Max(rd(p), rd(q)) 

wherein rd{.) is the number of clicked documents for a query of the queries, 
and wherein RD(p, q) is the number of document selections in common. 
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19. Computer-readable media as recited in claim 11, wherein the 
instructions for identifying the same document and/or the similar documents 
further comprise instructions for: 

determining the similar documents based on a hierarchical positioning 
between individual ones of a plurality of documents commonly selected across the 
queries. 



for: 



20. Computer-readable media as recited in claim 19: 

wherein F(di, dj) is a lowest common parent node for documents and d/, 

wherein L(x) is a level of a node x; 

wherein L Total identifies a total number of levels m a hierarchy; and 

wherein a similarity between two documents is defined as follows: 

. , , , L(F(d,dj))-\ 
m,, a ) - , such that 

LTotal - 1 

s(di, di)=l; and s(di, dj) = 0 if F(di, dj) = root; and 
wherein the computer-executable instructions further comprise instructions 



incorporating s(di, dj) into a calculation of query similarity, wherein. 
di (l<i<m) and dj (l<j<n) be a set of selected documents for queries p and q 
respectively such that: 



m n " m 

max s(d„dj)) Y.( "^^^ s(d^,dj)) 

i=\ I y=i 

rd(p) rd(q) 
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21. A computing device comprising: 

a processor coupled to a memory, the memory comprising computer 
executable instructions, the processor being configured to fetch and execute the 
computer-executable instructions for: 

identifying a same document and/or a plurality of similar documents 
selected by a user in response to a plurality of queries; and 

responsive to identifying the same document and/or the similar 
documents, generating a query cluster to indicate that the queries are similar 
independent of whether individual ones of the queries comprise similar 
composition with respect to other ones of the queries, 

22. A computing device as recited in claim 21, wherein the queries 
comprise a well formed natural language question, a keyword, or a phrase. 

23. A computing device as recited in claim 21, wherein the query cluster 
is used to disambiguate a word or phrase in a query of the queries. 

24. A computing device as recited in claim 21, wherein the computer- 
executable instructions further comprise instructions for determining that the 
queries are similar based on similar keyword or phrase composition. 
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25. A computing device as recited in claim 21, wherein the instructions 
for identifying the same document and/or the similar documents further comprise 
instructions for: 

determining the similar documents by evaluating a set of selected similar 
documents chosen responsive to queries p and g of the queries, wherein 
documents D_C(.) is a subset of a result list D(.) according to the following: 
D_C(p) ={dpj. dp2 , ... , dpi } ^D(p) 
D_C(q) = {d,j, d,2 . ... . d^} cD(q)- 

wherein similarity based on selection of documents is based on: 

lfD_C(p) n D_C(q) = { dp^j , dpq2 , ... , dpqk } 9^ 0, then documents dpqj , 
dpq2 , ■■■ , dpqk represent a set of common topics of queries p and q, and, 

whereby the similar documents between queries p and q is determined by 
D_C(p) nD_C(q). 

26. A computing device as recited in claim 21, wherein the computer- 
executable instructions further comprise instructions for constructing a thesaurus 
comprising a plurality of synsets, wherein each synset comprises one or more 
query clusters. 

27. A computing device as recited in claim 21, wherein the instructions 
for identifying the same document and/or the similar documents further comprise 
instructions for determining the similar documents based on a proportionality of 
commonly selected individual documents. 
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28. A computing device as recited in claim 27, wherein the instructions 
for identifying the same document and/or the similar documents further comprise 
instructions for: 

determining the similar documents based on a proportionality of commonly 

selected individual documents, such that: 

similarity Jp^ q) = . 

Max(rd(p), rd(q)) 

wherein rd{,) is the number of clicked documents for a query of the queries, 
and wherein RD{p, q) is the number of document selections in common. 

29. A computing device as recited in claim 21, wherein the instructions 
for identifying the same document and/or the similar documents further comprise 
instructions for: 

determining the similar documents based on a hierarchical positioning 
between individual ones of a plurality of documents commonly selected across the 
queries. 
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30. A computing device as recited in claim 29: 

wherein F(di, dj) is a lowest common parent node for documents di and dj; 

wherein L(x) is a level of a node x; 

wherein L Total identifies a total number of levels in a hierarchy; and 

wherein a similarity between two documents is defined as follows; 

L(F(d,dJ)~l 
s(d., d ) - — — — — — ^ such that 
L_Total - 1 

s(di, di)^l; and s(di, dJ^Oif F(di, dj) ^ root; and 
wherein the computer-executable instructions further comprise instructions 

for: 

incorporating s(di, dj) into a calculation of query similarity, wherein. 
df (l<i<m) and dj (l<j<n) be a set of selected documents for queries p and q 
respectively such that: 
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